Modelling diachrony in dictionaries
نویسندگان
چکیده
Introduction : The variety of lexical structures Lexical data appear in a wide variety of forms. These can range from basic morpho-syntactic structures (Romary et al., 2004) intended to be used in language engineering application to important editorial projects that cover multiple levels of lexicographic description: morphological information, syntactic constructs, sense related information (definitions, examples, usage notes, etc.) or historical information. Entries can also vary in their internal organization. Among other factors, the fundamental choice between an onomasiological (concept to word) and a semasiological representation (word to sense) directly impacts on the internal structure of entries, as well as on the possible choice of descriptors attached to them. From a computational point of view, this situation prevents the design of one single data structure that would fit all the possible needs, whereas one would like to be able to have uniform access to similar information across heterogeneous lexical sources. This has been the source of strong debates, leading for instance to the ubiquitous Print Dictionary chapter of the TEI (Text Encoding Initiative) that tries to combine structured and unstructured views of lexical entries. Still, we want to show in this paper that it is possible to apply coherent modeling principles to deal with this variety of structures while providing a precise account of complex sub-components such as diachronic information as they appear in dictionaries with wide lexical coverage. Besides, we want to show that such modeling principles can guide the possible evolution of the TEI towards a more flexible data for the concrete representation of dictionaries.
منابع مشابه
Optimized Selection of Intonation Dictionaries in Corpus Based Intonation Modelling
Data scarcity in corpus-based intonation modelling for TTS applications is addressed. We propose to apply a searching process to a list of dictionaries of classes of intonation patterns previously trained from corpus to avoid problems associated with the scarce number of samples in the classes. Results indicate that better results are obtained in comparison with previous alternatives where the ...
متن کاملOptimized selection of intonation dictionaries in corpus based intonation modelling
Data scarcity in corpus-based intonation modelling for TTS applications is addressed. We propose to apply a searching process to a list of dictionaries of classes of intonation patterns previously trained from corpus to avoid problems associated with the scarce number of samples in the classes. Results indicate that better results are obtained in comparison with previous alternatives where the ...
متن کاملOn multiword lexical units and their role in maritime dictionaries
Multi-word lexical units are a typical feature of specialized dictionaries, in particular monolingual and bilingual maritime dictionaries. The paper studies the concept of the multi-word lexical unit and considers the similarities and differences of their selection and presentation in monolingual and bilingual maritime dictionaries. The work analyses such issues as the classification of multi-w...
متن کامل